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1 RENO: A Rename-Based Instruction Optimizer jjj 
Mi May 2005 ACM SIGARCH Computer Architecture News , Proceedings of the 32nd 

Annual International Symposium on Computer Architecture ISCA '05, 

Volume 33 Issue 2 

Publisher: IEEE Computer Society, ACM Press 

Full text available: pdf(256.23 KB) Additional Information: full citation , abstract 

RENO is a modified MIPS R10000 register renamer that uses map-table "short-circuiting" 
to implement dynamic versions of several well-known static optimizations: move 
elimination, common subexpression elimination, register allocation, and constant folding. 
Because it implements these optimizations dynamically, RENO can apply optimizations in 
certain situations where static compilers cannot. Several of RENOys component 
optimizations have been previously proposed as independent mechanisms. Unified ... 
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Brian Fahs, Satarupa Bose, Matthew Crum, Brian Slechta, Francesco Spadini, Tony Tung, 
Sanjay J. Patel, Steven S. Lumetta 

December 2001 Proceedings of the 34th annual ACM/IEEE international symposium 
on Microarchitecture 

Publisher: IEEE Computer Society 

Full text available: ^ jf|] 
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We evaluate the rePLay microarchitecture as a means for reducing application execution 
time by facilitating dynamic optimization. The framework contains a programmable 
optimization engine coupled with a hardware-based recovery mechanism. The 
optimization engine enables the dynamic optimizer to run concurrently with program 
execution. The recovery mechanism enables the optimizer to make speculative 
optimizations without requiring recovery code. We demonstrate that a rePLay configuration 
performing ... 
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We propose a scheme for transient-fault recovery called Simultaneously and 
Redundantly Threaded processors with Recovery (SRTR) that enhances a 
previously proposed scheme for transient-fault detection, called Simultaneously and 
Redundantly Threaded (SRT) processors. SRT replicates an application into two 
communicating threads, one executing ahead of the other. The trailing thread repeats the 
computation performed by the leading thread, and the values produced by the two 
threads are compar ... 

Checkpoint Processing and Recovery: Towards Scalable Large Instruction Window 
Processors 

Haitham Akkary, Ravi Rajwar, Srikanth T. Srinivasan 

December 2003 Proceedings of the 36th annual IEEE/ACM International Symposium 

on Microarchitecture 
Publisher: IEEE Computer Society 

Full text available: *g | pdf(419.08 KB) Additional Information: full citation , abstract , ci tings , index terms 

Large instruction window processors achieve high performance by exposing large amounts 
of instruction levelparallelism. However, accessing large hardware structurestypically 
required to buffer and process such instructionwindow sizes significantly degrade the 
cycle time. This paper proposes a novel Checkpoint Processing and Recovery(CPR) 
microarchitecture, and shows how to implement alarge instruction window processor 
without requiring largestructures thus permitting a high clock frequency. We fo ... 

Register integration: a simple and efficient implementation of squash reuse H 
Amir Roth, Gurindar S. Sohi 

December 2000 Proceedings of the 33rd annual ACM/IEEE international symposium 

on Microarchitecture 
Publisher: ACM Press 
Full text available: gpdf(154.98 KB) 

j|| ps(573.81 KB) Additional Information: full citation , references , citings , index terms 
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Dynamic translation: The Transmeta Code Morphing™ Software: using speculation, Q 

recovery, and adaptive retranslation to address real-life challenges 

James C. Dehnert, Brian K. Grant, John P. Banning, Richard Johnson, Thomas Kistler, 

Alexander Klaiber, Jim Mattson 

March 2003 Proceedings of the international symposium on Code generation and 
optimization: feedback-directed and runtime optimization CGO v 03 

Publisher: IEEE Computer Society 

Full text available' *P) pdf(988 25 KB) Add ^ ona ' Information: full citation , abstract , references , ci tings , index 
• = terms , review 

Transmeta's Crusoe microprocessor is a full, system-level implementation of the x86 
architecture, comprising a native VLIW microprocessor with a software layer, the Code 
Morphing Software (CMS), that combines an interpreter, dynamic binary translator, 
optimizer, and runtime system. In its general structure, CMS resembles other binary 
translation systems described in the literature, but it is unique in several respects. The 
wide range of PC workloads that CMS must handle gracefully in real ... 

Keywords: binary translation, dynamic optimization, dynamic translation, emulation, 
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7 An analysis of a resource efficient checkpoint architecture 
Haitham Akkary, Ravi Rajwar, Srikanth T. Srinivasan 

December 2004 ACM Transactions on Architecture and Code Optimization (TACO), 

Volume 1 Issue 4 

Publisher: ACM Press 

Full text available: ^ | pdf(757.69 KB) Additional Information: full citation, abstract , references , index terms 

Large instruction window processors achieve high performance by exposing large amounts 
of instruction level parallelism. However, accessing large hardware structures typically 
required to buffer and process such instruction window sizes significantly degrade the 
cycle time. This paper proposes a novel checkpoint processing and recovery (CPR) 
microarchitecture, and shows how to implement a large instruction window processor 
without requiring large structures thus permitting a high clock frequency ... 

Keywords: Computer architecture, checkpoint architecture, high-performance 
computing, scalable architecture 





8 Transient-fault recovery for chip multiprocessors j 

♦ Mohamed Gomaa, Chad Scarbrough, T. N. Vijaykumar, Irith Pomeranz 
May 2003 ACM SIGARCH Computer Architecture News , Proceedings of the 30th 

annual international symposium on Computer architecture ISCA '03, volume 

31 Issue 2 

Publisher: ACM Press 

Full text available: pdf(370.75 KB) Additional Information: full citation , abstract , references , citings 

To address the increasing susceptibility of commodity chip multiprocessors (CMPs) to 
transient faults, we propose Chiplevel Redundantly Threaded multiprocessor with 
Recovery (CRTR). CRTR extends the previously-proposed CRT for transient-fault detection 
in CMPs, and the previously-proposed SRTR for transient-fault recovery in SMT. All these 
schemes achieve fault tolerance by executing and comparing two copies, called leading 
and trailing threads, of a given application. Previous recovery schemes ... 

9 Increasing Register File Immunity to Transient Errors ! 
Gokhan Memik, MahmutT. Kandemir, Ozcan Ozturk 

March 2005 Proceedings of the conference on Design, Automation and Test in Europe 
- Volume 1 

Publisher: IEEE Computer Society 

Full text available: ^ pdf(162.75 KB) Additional Information: full citation , abstract 

Transient errors are one of the major reasons for system downtime in many systems. 
While prior research has mainly focused on the impact of transient errors on datapath, 
caches and main memories, the register file has largely been neglected. Since the register 
file is accessed very frequently, the probability of transient errors is high. In addition, 
errors in it can quickly spread to different parts of the system, and cause application crash 
or silent data corruption. This paper addresses the r ... 

1° Using speculative retirement and larger instruction windows to narrow the 
4k performance gap between memory consistency models 
^ Parthasarathy Ranganathan, Vijay S. Pai, Sarita V. Adve 

June 1997 Proceedings of the ninth annual ACM symposium on Parallel algorithms 
and architectures 

Publisher: ACM Press 

Full text available: fppdfd.83 MB) Additional Information: full citation , references , citings , index terms 
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11 Register file and memory system design: Reducing register ports for higher speed j 
and lower energy 

II Park, Michael D. Powell, T. N. Vijaykumar 

November 2002 Proceedings of the 35th annual ACM/IEEE international symposium 

on Microarchitecture 
Publisher: IEEE Computer Society Press 

Full text available: ^ p(tf ^ 2 q MB)^ IP Additional Information: full citation , abstract , references , ci tings , index 
Publisher Site ^rms 

The key issues for register file design in high-performance processors are access time and 
energy. While previous work has focused on reducing the number of registers, we propose 
to reduce the number of register ports through two proposals, one for reads and the other 
for writes. For reads, we propose bypass hint to reduce register port requirements by 
avoiding unnecessary register file reads for cases where values are bypassed. Current 
processors are unable to avoid these unnecessary reads due ... 

12 Fingerprinting: bounding soft-error detection latency and bandwidth 

#Jared C. Smolens, Brian T. Gold, Jangwoo Kim, Babak Falsafi, James C. Hoe, Andreas G. 
Nowatzyk 

October 2004 ACM SIGPLAN Notices , ACM SIGARCH Computer Architecture News , 
ACM SIGOPS Operating Systems Review , Proceedings of the 11th 
international conference on Architectural support for programming 
languages and operating systems ASPLOS-XI, volume 39 , 32 , 38 issue 11,5,5 

Publisher: ACM Press 

Full text available: ^| pdf(229.65 KB) Additional Information: full citation , abstract , references , index terms 

Recent studies have suggested that the soft-error rate in microprocessor logic will become 
a reliability concern by 2010. This paper proposes an efficient error detection technique, 
called fingerprinting, that detects differences in execution across a dual modular 
redundant (DMR) processor pair. Fingerprinting summarizes a processor's execution 
history in a hash-based signature; differences between two mirrored processors are 
exposed by comparing their fingerprints. Fingerprinting tightly ... 

Keywords: backwards error recovery (BER), dual modular redundancy (DMR), error 
detection, soft errors 



13 An out-of-order execution technique for runtime binary translators 
jfa Bich C. Le 

v October 1998 ACM SIGOPS Operating Systems Review , ACM SIGPLAN Notices , 

Proceedings of the eighth international conference on Architectural 
support for programming languages and operating systems ASPLOS- 

VIII, Volume 32 , 33 Issue 5 , 11 

Publisher: ACM Press 

Full text available- ffi l pdf(1.04 MB) Additional Information: full citation, abstract, references, citings, index 
™ terms 

A dynamic translator emulates an instruction set architecture by translating source 
instructions to native code during execution. On statically-scheduled hardware, higher 
performance can potentially be achieved by reordering the translated instructions; 
however, this is a challenging transformation if the source architecture supports precise 
exception semantics, and the user-level program is allowed to register exception 
handlers. This paper presents a software technique which allows a translato ... 
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Superscalar design: Cherry: checkpointed early resource recycling in out-of-order 
microprocessors 

Jose F. Martinez, Jose Renau, Michael C. Huang, Milos Prvulovic, Josep Torrellas 
November 2002 Proceedings of the 35th annual ACM/IEEE international symposium 

on Microarchitecture 
Publisher: IEEE Computer Society Press 

Full text available: ^ p ^ 14Q MB)^ jP Additional Information: full citation, abstract, references , ci tings , index 
Publisher Site t^, review 

This paper presents CHeckpointed Early Resource Recycling (Cherry), a hybrid mode of 
execution based on ROB and checkpointing that decouples resource recycling and 
instruction retirement. Resources are recycled early, resulting in a more efficient 
utilization. Cherry relies on state checkpointing and rollback to service exceptions for 
instructions whose resources have been recycled. Cherry leverages the ROB to (1) not 
require in-order execution as a fallback mechanism, (2) allow memory re ... 

15 Toward kilo-instruction processors 

# Adrian Cristal, Oliverio J. Santana, Mateo Valero, Jose F. Martinez 
December 2004 ACM Transactions on Architecture and Code Optimization (TACO), 

Volume 1 Issue 4 

Publisher: ACM Press 

Full text available: ^ pdf ( 1.16 MB) Additional Information: full citation , abstract , references , index terms 

The continuously increasing gap between processor and memory speeds is a serious 
limitation to the performance achievable by future microprocessors. Currently, processors 
tolerate long-latency memory operations largely by maintaining a high number of in-flight 
instructions; In the future, this may require supporting many hundreds, or even 
thousands, of in-flight instructions. Unfortunately, the traditional approach of scaling up 
critical processor structures to provide such support is impractica ... 

Keywords: Memory wall, instruction-level parallelism, kilo-instruction processors, 
multicheckpointing 



16 Enhancing software reliability with speculative threads 
Jeffrey Oplinger, Monica S. Lam 

October 2002 ACM SIGPLAN Notices , ACM SIGOPS Operating Systems Review , ACM 
SIGARCH Computer Architecture News , Proceedings of the 10th 
international conference on Architectural support for programming 
languages and operating systems ASPLOS-X, Volume 37 , 36 , 30 issue 10 , 5 , 5 
Publisher: ACM Press 

Full text available: *^ [pdf(1.47 MB) Additional Information: full citation , abstract , references , ci tings 

This paper advocates the use of a monitor-and-recover programming paradigm to 
enhance the reliability of software, and proposes an architectural design that allows 
software and hardware to cooperate in making this paradigm more efficient and easier to 
program. We propose that programmers write monitoring functions assuming simple 
sequential execution semantics. Our architecture speeds up the computation by executing 
the monitoring functions speculatively in parallel with the main computation. For ... 

17 Novel ideas: Skipper: a microarchitecture for exploiting control-flow independence 
Chen-Yong Cher, T. N. Vijaykumar 

December 2001 Proceedings of the 34th annual ACM/IEEE international symposium 

on Microarchitecture 
Publisher: IEEE Computer Society 

Full text available: ^ pdf(151 Mm |p Additional Information: full citation , abstract , references , citings 
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Although modern superscalar processors achieve high branch prediction accuracy, certain 
branches either are inherently difficult to predict or incur destructive interference in 
prediction tables, causing significant performance loss due to mispredictions. We propose 
a novel microarchitecture, called Skipper, to handle such difficult branches by exploiting 
control-flow independence. Previous approaches to handling difficult branches, one way or 
another, amount to executing incorrect instructions, ... 



18 The taser intrusion recovery system 

<% Ashvin Goe| / Kenneth Po, Kamran Farhadi, Zheng Li, Eyal de Lara 

^ October 2005 ACM SIGOPS Operating Systems Review , Proceedings of the twentieth 
ACM symposium on Operating systems principles SOSP '05, volume 39 issue 

5 

Publisher: ACM Press 

Full text available: ^ pdf(346.32 KB) Additional Information: full citation , abstract , references , index terms 

Recovery from intrusions is typically a very time-consuming operation in current systems. 
At a time when the cost of human resources dominates the cost of computing resources, 
we argue that next generation systems should be built with automated intrusion recovery 
as a primary goal. In this paper, we describe the design of Taser, a system that helps in ° 
selectively recovering legitimate file-system data after an attack or local damage occurs. 
Taser reverts tainted, i.e. attack-dependent, file-syst ... 

Keywords: file systems, intrusion analysis, intrusion recovery, snapshots 



19 Selective eager execution on the PolyPath architecture 

#Artur Klauser, Abhijit Paithankar, Dirk Grunwald 
April 1998 ACM SIGARCH Computer Architecture News , Proceedings of the 25th 

annual international symposium on Computer architecture ISCA '98, volume 

26 Issue 3 

Publisher: IEEE Computer Society, ACM Press 

Full text available; ^ ^^^ Additional Information: full citation , abstract , references , citings , index 

Publisher Site t^OHS 

Control-flow misprediction penalties are a major impediment to high performance in wide- 
issue superscalar processors. In this paper we present Selective Eager Execution (SEE), 
an execution model to overcome mis-speculation penalties by executing both paths after 
diffident branches. We present the micro-architecture of the PolyPath processor, which is 
an extension of an aggressive superscalar, out-of-order architecture. The PolyPath 
architecture uses a novel instruction tagging and ... 



20 Using Dynamic Binary Translation to Fuse Dependent Instructions jfl 
Shiliang Hu, James E. Smith 

March 2004 Proceedings of the international symposium on Code generation and 

optimization: feedback-directed and runtime optimization CGO '04 
Publisher: IEEE Computer Society 

Full text available: ^ pdf(240.50 KB) Additional Information: full citation , abstract , citings , index terms 

Instruction scheduling hardware can be simplifiedand easily pipelined if pairs of dependent 
instructionsare fused so they share a single instruction schedulingslot. We study an 
implementation of the x86 ISA thatdynamically translates x86 code to an underlying 
ISAthat supports instruction fusing. A microarchitecturethat is co-designed with the fused 
instruction set completesthe implementation.In this paper, we focus on the dynamic 
binarytranslator for such a co-designed x86 virtual machine.The dy ... 
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